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1 Similarity queries I: Efficient similarity search and classification via rank aggregation 
Ronald Fagin, Ravi Kumar, D. Sivakumar 

June 2003 Proceedings of the 2003 ACM SIGMOD i nternational conference on 
Management of data 

Publisher: ACM Press 

Additional Information: full citation , abstract , references , citings, index 
terms 



Full text available: ■[§ pdf(198.89 KB) 



We propose a novel approach to performing efficient similarity search and classification in 
high dimensional data. In this framework, the database elements are vectors in a 
Euclidean space. Given a query vector in the same space, the goal is to find elements of 
the database that are similar to the query. In our approach, a small number of 
independent "voters" rank the database elements based on similarity to the query. These 
rankings are then combined by a highly efficient aggregation algorithm. ... 

Exploiting early sorting and early partitioning for decision support query processing 
J. Claussen, A. Kemper, D. Kossmann, C. Wiesner 

December 2000 The VLDB Journal - The International Journal on Very Large Data 

Bases, Volume 9 Issue 3 
Publisher: Springer-Verlag New York, Inc. 

Full text available: ^ pdf(478.23 KB) Additional Information: full citation , abstract , index terms 

Decision support queries typically involve several joins, a grouping with aggregation, 
and/or sorting of the result tuples. We propose two new classes of query evaluation 
algorithms that can be used to speed up the execution of such queries. The algorithms are 
based on (1) early sorting and (2) early partitioning- or a combination of both. The idea is 
to push the sorting and/or the partitioning to the leaves, i.e., the base relations, of the 
query evaluation plans (QEPs) and ... 



Keywords: Decision Support Systems, Early sorting and partitioning, Hash joins and 
hash teams, Performance evaluation, Query processing and optimization 



Incremental computation and maintenance of temporal ag g regates 
Jun Yang, Jennifer Widom 

October 2003 The VLDB Journal — The International Journal on Very Large Data 

Bases, Volume 12 Issue 3 
Publisher: Springer-Verlag New York, Inc. 
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Full text available: ^) pdf(360.68 KB) Additional Information: full citation , abstract , citings, index terms 

Abstract.We consider the problems of computing aggregation queries in temporal 
databases and of maintaining materialized temporal aggregate views efficiently. The latter 
problem is particularly challenging since a single data update can cause aggregate results 
to change over the entire time line. We introduce a new index structure called the SB- 
tree, which incorporates features from both segment-trees and B-trees. SB-trees support 
fast lookup of aggregate results based on ti ... 

Keywords: Access methods, Aggregation, B-tree, Segment tree, Temporal database, 
View maintenance 



4 On parallel processing of aggregate and scalar functions in object-relational DBMS 
Michael Jaedicke, Bernhard Mitschang 

June 1998 ACM SIGMOD Record , Proceedings of the 1998 ACM SIGMOD international 

conference on Management of data SIGMOD '98, Volume 27 issue 2 
Publisher: ACM Press 

Full text available- fj£| pdf(1.43 MB) Additional Information: full citation , abstract, references, citing s, index 
^ terms 

Nowadays parallel object-relational DBMS are envisioned as the next great wave, but 
there is still a lack of efficient implementation concepts for some parts of the proposed 
functionality. Thus one of the current goals for parallel object-relational DBMS is to move 
towards higher performance. In this paper we develop a framework that allows to process 
user-defined functions with data parallelism. We will describe the class of partitionable 
functions that can be processed parallelly. We will ... 

Keywords: aggregates, object-relational database systems, parallel query processing, 
user-defined functions 



Query processing for relational data: Supporting ad-hoc ranking aggregates 
Chengkai Li, Kevin Chen-Chuan Chang, Ihab F. Ilyas 

June 2006 Proceedings of the 2006 ACM SIGMOD international conference on 
Management of data SIGMOD '06 

Publisher: ACM Press 

Full text available: ^ pdf (3 44.23 KB ) Additional Information: full citation , abstra ct, references , index terms 

This paper presents a principled framework for efficient processing of ad-hoc top-k 
(ranking) aggregate queries, which provide the k groups with the highest aggregates as 
results. Essential support of such queries is lacking in current systems, which process the 
queries in a naive materialize-group-sort scheme that can be prohibitively inefficient. Our 
framework is based on three fundamental principles. The Upper-Bound Principle dictates 
the requirements of early pruning, and ... 

Keywords: OLAP, aggregate query, decision support, ranking, top-k query processing 



Log ics with ag gregate operators 

Lauri Hella, Leonid Libkin, Juha Nurmonen, Limsoon Wong 
July 2001 Journal of the ACM ( JACM), Volume 48 Issue 4 
Publisher: ACM Press 

Full text available' fifl pdf(323 27 KB) Additional Information: full citation , abstract , references , citings , index 
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We study adding aggregate operators, such as summing up elements of a column of a 
relation, to logics with counting mechanisms. The primary motivation comes from 



database applications, where aggregate operators are present in all real life query 
languages. Unlike other features of query languages, aggregates are not adequately 
captured by the existing logical formalisms. Consequently, all previous approaches to 
analyzing the expressive power of aggregation were only capable of producing partial ... 

Keywords: Aggregation, database, expressive power, locality, relational calculus 



Query processing: Exploiting hierarchical clustering in evaluating multidimensional 
a ggregation queries 
Dimitri Theodoratos 

November 2003 Proceedings of the 6th ACM international workshop on Data 
warehousing and OLAP 

Publisher: ACM Press 

Full text available: *g )pdf(216.79 KB) Additional Information: full citation , abstract , references , index terms 

Multidimensional aggregation queries constitute the single most important class of queries 
for data warehousing applications and decision support systems. The bottleneck in the 
evaluation of these queries is the join of the usually huge fact table with the restricted 
dimension tables (star-join). Recently, a multidimensional hierarchical clustering schema 
for star schemas is suggested. Subsequently, query evaluation plans for multidimensional 
queries appeared that essentially implement a ... 

Keywords: multidimensional aggregation query, multidimensional hierarchical clustering, 
query transformations, star join 



A ggregate predicate support in DBMS 

Apostol (Paul) Natsev, Gene Y. C. Fuh, Weidong Chen, Chi-Huang Chiu, Jeffrey S. Vitter 
January 2002 Australian Computer Science Communications , Proceedings of the 

thirteenth Australasian database conference - Volume 5 ADC '02, volume 

24 Issue 2 

Publisher: Australian Computer Society, Inc., IEEE Computer Society Press 

Full text available: ^ pdf(1.57 MB) Additional Information: full citation , abstract , references , index terms 

In this paper we consider aggregate predicates and their support in database systems. 
Aggregate predicates are the predicate equivalent to aggregate functions in that they can 
be used to search for tuples that satisfy some aggregate property over a set of tuples (as 
opposed to simply computing an aggregate property over a set of tuples). The importance 
of aggregate predicates is exemplified by many modern applications that require ranked 
search, or top-k queries. Such queries are the norm ... 

Keywords: aggregate predicates, nearest neighbor, query optimization 



Online ag greg ation 

Joseph M. Hellerstein, Peter J. Haas, Helen J. Wang 

June 1997 ACM SIGMOD Record , Proceedings of the 1997 ACM SIGMOD international 

conference on Management of data SIGMOD *97, Volume 26 issue 2 
Publisher: ACM Press 

Full text available- pdfd 92 MB) Additional Information: full citation, abstract, references, citings , index 
' ^ * terms 

Aggregation in traditional database systems is performed in batch mode: a query is 
submitted, the system processes a large volume of data over a long period of time, and, 
eventually, the final answer is returned. This archaic approach is frustrating to users and 
has been abandoned in most other areas of computing. In this paper we propose a new 
online aggregation interface that permits users to both observe the progress of their 



aggregation queries and control execution on 



10 Research session: query optimization #2: Optimizing nested queries with parameter 
sort orders 

Ravindra Guravannavar, H. S. Ramanujam, S. Sudarshan 

August 2005 Proceedings of the 31st international conference on Very large data 
bases VLDB '05 

Publisher: VLDB Endowment 

Full text available: 'g] pdf(200.19 KB ) Additional Information: full citation , abstract , references , index terms 

Nested Iteration is an important technique for query evaluation. It is the default way of 
executing nested subqueries in SQL. Although decorrelation often results in cheaper non- 
nested plans, decorrelation is not always applicable for nested subqueries. Nested 
iteration, if implemented properly, can also win over decorrelation for several classes of 
queries. Decorrelation is also hard to apply to nested iteration in user-defined SQL 
procedures and functions. Recent research has proposed evaluati ... 

11 Visibility sorting and compositing without splitting for image layer decompositions 
John Snyder, Jed Lengyel 

July 1998 Proceedings of the 25th annual conference on Computer graphics and 
interactive techniques 

Publisher: ACM Press 

Full text available: ^ pdf(591 .53 KB) Additional Information: full citation , references , citings , index terms 




Keywords: compositing, kd-tree, nonsplitting layered decomposition, occlusion cycle, 
occlusion graph, sprite, visibility sorting 



12 Performance evaluation of the statistical a ggre gation by categorization in the SM3 
system 

v C. K Baru, S. Y. W. Su 

June 1984 ACM SIGMOD Record , Proceedings of the 1984 ACM SIGMOD international 

conference on Management of data SIGMOD '84, volume 14 issue 2 
Publisher: ACM Press 

Full text available: ^[pdf(1.32 MB) Additional Information: full citation , abstract , references 

To perform a statistical aggregation operation over a large file often requires that the 
records of the file be divided into categories based on the values of the attribute(s) over 
which some statistical computation is to be performed. It is rather inefficient to perform 
the necessary data transfer, categorization and statistical computation using a single 
processor Parallel algorithms designed for multiprocessor systems have been proposed 
and their performance improvement over the conventional ... 



13 Optimal aggregation algorithms for middleware 
Ronald Fagin, Amnon Lotem, Moni Naor 

May 2001 Proceedings of the twentieth ACM SIGMOD -SIGACT-SIG ART symposium on 
Principles of database systems 

Publisher: ACM Press 

Full text available: p df(231 .47 KB) A^' 0 ™ 1 Information: full citation , abstract, references, citings, index 
' ^ terms 

Assume that each object in a database has m grades, or scores, one for each of m 
attributes. For example, an object can have a color grade, that tells how red it is, and a 
shape grade, that tells how round it is. For each attribute, there is a sorted list, which lists 
each object and its grade under that attribute, sorted by grade (highest grade first). 




There is some monotone aggregation function, or combining rule, such as min or average, 
that combines the individ ... 

14 Research papers: stream aggregation: Semantics and evaluation techniques for 
window aggregates in data streams 

Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker 
June 2005 Proceedings of the 2005 ACM SIGMOD i nternational conference on 

Management of data 
Publisher: ACM Press 

Full text available: ^ pdf(564.92 KB) Additional Information: full citation , abstract , references 

A windowed query operator breaks a data stream into possibly overlapping subsets of 
data and computes a result over each. Many stream systems can evaluate window 
aggregate queries. However, current stream systems suffer from a lack of an explicit 
definition of window semantics. As a result, their implementations unnecessarily confuse 
window definition with physical stream properties. This confusion complicates the stream 
system, and even worse, can hurt performance both in terms of memory usage ... 

15 An NF2 relational interface for document retrieval, restructuring and aggregation 
Kalervo Jarvelin, Timo Niemi 

July 1995 Proceedings of the 18th annual international ACM SIGIR conference on 

Research and development in information retrieval 
Publisher: ACM Press 

Full text available: ^ pdf(985.40 KB) Additional Information: full citation , references , citings , index terms 





16 Fast algorithms for universal quantification in large databases 
Goetz Graefe, Richard L. Cole 

June 1995 ACM Transactions on Database Systems (TODS), Volume 20 issue 2 
Publisher: ACM Press 

Full text available- fj flpdf(3.51 MB) Additional Information: full citation , abstract, references, citings, index 

terms , review 

Universal quantification is not supported directly in most database systems despite the 
fact that it adds significant power to a system's query processing and inference 
capabilities, in particular for the analysis of many-to-many relationships and of set-valued 
attributes. One of the main reasons for this omission has been that universal 
quantification algorithms and their performance have not been explored for large 
databases. In this article, we describe and compare three known algorithms ... 

17 An array-based algorithm for simultaneous multidimensional ag g regates 
Yihong Zhao, Prasad M. Deshpande, Jeffrey F. Naughton 

June 1997 ACM SIGMOD Record , Proceedings of the 1997 ACM SIGMOD international 

conference on Management of data SIGMOD '97, volume 26 issue 2 
Publisher: ACM Press 

Full text available* fi? |pdf(1.45 MB) Additional Information: full citation , abstract , references , citings , index 
' ^ terms 

Computing multiple related group-bys and aggregates is one of the core operations of On- 
Line Analytical Processing (OLAP) applications. Recently, Gray et al. [GBLP95] proposed 
the "Cube" operator, which computes group-by aggregations over all possible subsets of 
the specified dimensions. The rapid acceptance of the importance of this operator has led 
to a variant of the Cube being proposed for the SQL standard. Several efficient algorithms 
for Relational OLAP (ROLAP) have been d ... 



Parallel al g orithms for the execution of relational database operations 




Dina Bitton, Haran Boral, David J. DeWitt, W. Kevin Wilkinson 

September 1983 ACM Transactions on Database Systems (TODS), volume 8 issue 3 
Publisher: ACM Press 




Additional Information: full citation , abstract , references , citings, index 



This paper presents and analyzes algorithms for parallel processing of relational database 
operations in a general multiprocessor framework. To analyze alternative algorithms, we 
introduce an analysis methodology which incorporates I/O, CPU, and message costs and 
which can be adjusted to fit different multiprocessor architectures. Algorithms are 
presented and analyzed for sorting, projection, and join operations. While some of these 
algorithms have been presented and analyzed previously, we ... 

Keywords: aggregate operations, database machines, join operation, parallel processing, 
projection operator, sorting 



19 Query evaluation techniques for large databases 

#Goetz Graefe 
June 1993 ACM Computing Surveys (CSUR), Volume 25 issue 2 
Publisher: ACM Press 



Database management systems will continue to manage large data volumes. Thus, 
efficient algorithms for accessing and manipulating large sets and sequences will be 
required to provide acceptable performance. The advent of object-oriented and extensible 
database systems will not solve this problem. On the contrary, modern data models 
exacerbate the problem: In order to manipulate large sets of complex objects as 
efficiently as today's database systems manipulate simple records, query-processi ... 

Keywords: complex query evaluation plans, dynamic query evaluation plans, extensible 
database systems, iterators, object-oriented database systems, operator model of 
parallelization, parallel algorithms, relational database systems, set-matching algorithms, 
sort-hash duality 



20 Object-based and ima g e-based object rep resentations 

#Hanan Samet 
June 2004 ACM Computing Surveys (CSUR), volume 36 issue 2 
Publisher: ACM Press 

Full text available:^ pdfd. 05 MB) Additional Information: full citation , abstract , references , index terms 

An overview is presented of object-based and image-based representations of objects by 
their interiors. The representations are distinguished by the manner in which they can be 
used to answer two fundamental queries in database applications: (1) Feature query: 
given an object, determine its constituent cells (i.e., their locations in space). (2) Location 
query: given a cell (i.e., a location in space), determine the identity of the object (or 
objects) of which it is a member as well as the re ... 

Keywords: Access methods, R-trees, feature query, geographic information systems 
(GIS), image space, location query, object space, octrees, pyramids, quadtrees, space- 
filling curves, spatial databases 
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1 Session: database and program conversion: Towards the support of integrated views 

of multiple databases: an aggregate schema facility 
^ Donald Swartwout, James P. Fry 

May 1978 Proceedings of the 1978 ACM SIGMOD i nternational conference on 

management of data 
Publisher: ACM Press 

Full text available: Qpdf(1.03 MB) Additional Information: full citation , abstract , references 

Supporting multiple user views of databases is currently an important problem area in 
database management system development. An interesting facet of this problem arises 
whenever a user needs an integrated view of several distinct databases. Using traditional 
database concepts, an aggregate schema facility has been developed to address this 
problem. The basic functions of an aggregate schema facility are discussed, as well as 
their implementation in a CO DAS Y IV D BTG - 1 i ke environment. Interest in a ... 

Keywords: aggregate schema, data definition languages, data translation, database 
integration, database management systems, database restructuring, distributed 
databases, dynamic translation 



Automatic high-quality reengineering of database programs by abstraction. | 
transformation and reimplementation 
Yossi Cohen, Yishai A. Feldman 

July 2003 ACM Transactions on Software Engineering and Methodology (TOSEM), 

Volume 12 Issue 3 
Publisher: ACM Press 

Full text available: ^ pdf(245.97 KB) Additional Information: full citation , abstract , references , index terms 

Old-generation database models, such as the indexed-sequential, hierarchical, or network 
models, provide record-level access to their data, with all application logic residing in the 
hosting program. In contrast, relational databases can perform complex operations, such 
as filter, aggregation, and join, on multiple records without an external specification of the 
record-access logic. Programs written for relational databases attempt to move as much 
of the application logic as possible into the d ... 



Keywords: Database program reengineering, query graphs, temporal abstraction, the 
plan calculus 



The Lo g ical Record Access Approach to Database Design 
Toby J. Teorey, James P. Fry 

June 1980 ACM Computing Surveys (CSUR), Volume 12 issue 2 
Publisher: ACM Press 

Full text available: ^ pdf(2.81 MB) Additional Information: full citation , references , citings , index terms 



An overview of data warehousing and OLAP technology 

Surajit Chaudhuri, Umeshwar Dayal 

March 1997 ACM SIGMOD Record, Volume 26 issue 1 

Publisher: ACM Press 

Full text available: ^ pdf(101.60 KB) Additional Information: full citation , abstract , citings , index terms 

Data warehousing and on-line analytical processing (OLAP) are essential elements of 
decision support, which has increasingly become a focus of the database industry. Many 
commercial products and services are now available, and all of the principal database 
management system vendors now have offerings in these areas. Decision support places 
some rather different requirements on database technology compared to traditional on- 
line transaction processing applications. This paper provides an overview ... 

Techniques for Structuring Database Records 
Salvatore T. March 

March 1983 ACM Computing Surveys (CSUR), Volume 15 issue 1 
Publisher: ACM Press 

Full text available: ^ ) pdf(3.02 MB) Additional Information: full citation , references , citings , index terms 



Query evaluation te chniques for large databases 
Goetz Graefe 

June 1993 ACM Computing Surveys (CSUR), Volume 25 issue 2 
Publisher: ACM Press 

Full text available' pdf(9.37 MB) Additional Information: full citation , abstract , references , citings , index 

terms , review 

Database management systems will continue to manage large data volumes. Thus, 
efficient algorithms for accessing and manipulating large sets and sequences will be 
required to provide acceptable performance. The advent of object-oriented and extensible 
database systems will not solve this problem. On the contrary, modern data models 
exacerbate the problem: In order to manipulate large sets of complex objects as 
efficiently as today's database systems manipulate simple records, query-processi ... 

Keywords: complex query evaluation plans, dynamic query evaluation plans, extensible 
database systems, iterators, object-oriented database systems, operator model of 
parallelization, parallel algorithms, relational database systems, set-matching algorithms, 
sort-hash duality 



Pro g ressive evaluation of nested aggregate queries 
Kian-Lee Tan, Cheng Hian Goh, Beng Chin Ooi 

December 2000 The VLDB Journal — The International Journal on Very Large Data 

Bases, Volume 9 Issue 3 
Publisher: Springer-Verlag New York, Inc. 

Full text available: ^ pdf(380.81 KB) Additional Information: full citation , abstract , index terms 



In many decision-making scenarios, decision makers require rapid feedback to their 
queries, which typically involve aggregates. The traditional blocking execution model can 
no longer meet the demands of these users. One promising approach in the literature, 
called online aggregation, evaluates an aggregation query progressively as follows: as 
soon as certain data have been evaluated, approximate answers are produced with their 
respective running confidence intervals; as more data a ... 

Keywords: Approximate answers, Multi-threading, Nested aggregate queries, Online 
aggregation, Progressive query processing 



8 Aggreg ation everywhere: data reduction and transformation in the Phoenix data 

warehouse 
Steven Tolkin 

November 1999 Proceedings of the 2nd ACM international workshop on Data 
warehousing and OLAP 

Publisher: ACM Press 

Full text available: ^ pdf(1.23 MB) Additional Information: full citation, abstract , references , index terms 

This paper describes the Phoenix system, which loads a data warehouse and then reports 
against it. Between the raw atomic data of the source system and the business measures 
presented to users there are many computing environments. Aggregation occurs 
everywhere: initial bucketing by the natural keys on the mainframe, loading the fact table 
using a mapping table, maintaining aggregate tables and reporting tables in the data 
base, in the GUI, in SQL queries issued on behalf of client tools by ... 

Keywords: OLAP, SQL, aggregation, data lineage, data warehouse 




9 Similarity queries I: Efficient similarity search and classification via rank a gg regation 
Ronald Fagin, Ravi Kumar, D. Sivakumar 

v June 2003 Proceedings of the 2003 ACM SIGMOD i nternational conference on 
Management of data 
Publisher: ACM Press 

r- „ * ^ i ui » M i a no on i^dx Additional Information: full citation , abstract , references , citings , index 

Full text available: 113 pdfd 98.89 KB) 
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We propose a novel approach to performing efficient similarity search and classification in 
high dimensional data. In this framework, the database elements are vectors in a 
Euclidean space. Given a query vector in the same space, the goal is to find elements of 
the database that are similar to the query. In our approach, a small number of 
independent "voters" rank the database elements based on similarity to the query. These 
rankings are then combined by a highly efficient aggregation algorithm. ... 

10 Data warehousin g : Integrating compression and execution in column-oriented 
^ database systems 

^ Daniel Abadi, Samuel Madden, Miguel Ferreira 

June 2006 Proceedings of the 2006 ACM SIGMOD i nternational conference on 
Management of data SIGMOD '06 

Publisher: ACM Press 

Full text available: ^ pdf(291.42 KB) Additional Information: full citation , abstract , index terms 

Column-oriented database system architectures invite a re-evaluation of how and when 
data in databases is compressed. Storing data in a column-oriented fashion greatly 
increases the similarity of adjacent records on disk and thus opportunities for 
compression. The ability to compress many adjacent tuples at once lowers the per-tuple 
cost of compression, both in terms of CPU and space overheads. In this paper, we discuss 



how we extended C-Store (a column-oriented DBMS) with a compression sub-syste ... 

Keywords: column-oriented databases, column -stores, database compression, query 
execution 



11 A Gopher interface to relational databases 
Paul Lindner 

November 1993 Proceedings of the 21st annual ACM SIGUCCS conference on User 

services 
Publisher: ACM Press 
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12 Tools & techniques track: frameworks for building libraries: Using collection 
<H> descriptions to enha nce an aggre gation of harvested item-level metadata 
Muriel Foulonneau, Timothy W. Cole, Thomas G. Habing, Sarah L. Shreeves 
June 2005 Proceedings of the 5th ACM/IE EE-CS joint conference on Digital libraries 
Publisher: ACM Press 

Full text available: ^ pdf(1.09 MB) Additional Information: full citation , abstract , references, index terms 

As an increasing number of digital library projects embrace the harvesting of item-level 
descriptive metadata, issues of description granularity and concerns about potential loss 
of context when harvesting item-level metadata take on greater significance. Collection- 
level description can provide valuable context for item-level metadata records harvested 
from disparate and heterogeneous providers. This paper describes an ongoing experiment 
using collection-level description in concert with item-l ... 

Keywords: collection-level description, descriptive metadata, metadata aggregation, 
open archives initiative 



Data streams: On-the-fly sharing for streamed aggregation 
Sailesh Krishnamurthy, Chung Wu, Michael Franklin 

June 2006 Proceedings of the 2006 ACM SIGMOD international conference on 
Management of data SIGMOD '06 

Publisher: ACM Press 

Full text available: ^ pdf(1.11 MB) Additional Information: full citation , abstract , references , index terms 

Data streaming systems are becoming essential for monitoring applications such as 
financial analysis and network intrusion detection. These systems often have to process 
many similar but different queries over common data. Since executing each query 
separately can lead to significant scalability and performance problems, it is vital to share 
resources by exploiting similarities in the queries. In this paper we present ways to 
efficiently share streaming aggregate queries with differing periodic ... 

Keywords: aggregation, multiple-query optimization, shared processing, streaming data 

The HiPAC project: combining active databases and timing constraints 

M. J. Carey, M. Livny, R. Jauhari 

March 1988 ACM SIGMOD Record, Volume 17 issue l 

Publisher: ACM Press 

Full text available: pdf(1.39 MB) Additional Information: full citation , abstract , citings , index terms 
The HiPAC (High Performance Active database system) project addresses two critical 




problems in time-constrained data management: the handling of timing constraints in 
databases, and the avoidance of wasteful polling through the use of situation-action rules 
that are an integral part of the database and are monitored by DBMS's condition monitor. 
A rich knowledge model provides the necessary primitives for definition of timing 
constraints, situation-action rules, and precipitating events. The ... 

15 A J2EE application for process accounting, LPAR accounting, and transaction 

accounting 
V C. Eric Wu, William P. Horn 

July 2005 Proceedings of the 5th international workshop on Software and 
performance WOSP '05 

Publisher: ACM Press 

Full text available: ^pdf(751.27 KB) Additional Information: full citation , abstract , references , index terms 

Accounting is critical for information technology budgeting and chargeback. Traditional 
accounting in UNIX/Linux systems is known as process accounting, in which an accounting 
record is created when a process ends. System administrators typically aggregate 
accounting records based on individual users or groups. As Web and application servers 
along with databases handle requests and transactions for multiple entities in various Web 
applications and services, LPAR accounting and transaction accoun ... 

Keywords: ARM transactions, process accounting, project accounting, resource usage, 
transaction accounting 



16 Authentication and integ rit y in outsourced databases 
Einar Mykletun, Maithili Narasimha, GeneTsudik 
May 2006 ACM Transactions on Storage (TOS), volume 2 issue 2 
Publisher: ACM Press 

Full text available: ^[ pdf(531.47 KB) Additional Information: full citation , abstract , references , index terms 

In the Outsourced Database (ODB) model, entities outsource their data management 
needs to a third-party service provider. Such a service provider offers mechanisms for its 
clients to create, store, update, and access (query) their databases. This work provides 
mechanisms to ensure data integrity and authenticity for outsourced databases. 
Specifically, this article provides mechanisms that assure the querier that the query 
results have not been tampered with and are authentic (with respect to the ... 

Keywords: Outsourced databases, authentication, data authenticity, data integrity, 
integrity, signature aggregation, storage 
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^ Naga K. Govindaraju, Brandon Lloyd, Wei Wang, Ming Lin, Dinesh Manocha 

June 2004 Proceedings of the 2004 ACM SIGMOD i nternational conference on 

Management of data 
Publisher: ACM Press 

Full text available: ^] pdf(386.13 KB) Additional Information: full citation , abstract , references 

We present new algorithms for performing fast computation of several common database 
operations on commodity graphics processors. Specifically, we consider operations such 
as conjunctive selections, aggregations, and semi-linear queries, which are essential 
computational components of typical database, data warehousing, and data mining 
applications. While graphics processing units (GPUs) have been designed for fast display 
of geometric primitives, we utilize the inherent pipelining and paralleli ... 



Keywords: aggregation, graphics processor, query optimization, selection query, 
selectivity analysis, semi-linear query 
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David Luebke, Mark Harris, Jens Kruger, Tim Purcell, Naga Govindaraju, Ian Buck, Cliff 
Woolley, Aaron Lefohn 

August 2004 Proceedings of the conference on SIGGRAPH 2004 course notes 

SIGGRAPH '04 
Publisher: ACM Press 

Full text available: ^ pdf(63,03 MB) Additional Information: full citation , abstract 

The graphics processor (GPU) on today's commodity video cards has evolved into an 
extremely powerful and flexible processor. The latest graphics architectures provide 
tremendous memory bandwidth and computational horsepower, with fully programmable 
vertex and pixel processing units that support vector operations up to full IEEE floating 
point precision. High level languages have emerged for graphics hardware, making this 
computational power accessible. Architecturally, GPUs are highly parallel s ... 

19 Research sessions: indexing and tuning: Transaction support for indexed summary 
views 

Goetz Graefe, Michael Zwilling 
June 2004 Proceedings of the 2004 ACM SIGMOD i nternational conference on 
Management of data 

Publisher: ACM Press 

Full text available: ^ pdf(168.70 KB) Additional Information: full citation , abstract , references 

Materialized views have become a standard technique for performance improvement in 
decision support databases and for a variety of monitoring purposes. In order to avoid 
inconsistencies and thus unpredictable query results, materialized views and their indexes 
should be maintained immediately within user transaction just like indexes on ordinary 
tables. Unfortunately, the smaller a materialized view is, the higher the concurrency 
contention between queries and updates as well as among concurrent ... 

20 Tools and transformations — rigorous and otherwise— for practical database design 
Arnon Rosenthal, David Reiner 

June 1994 ACM Transactions on Database Systems (TODS), volume 19 issue 2 
Publisher: ACM Press 

Full text available- 115 pdf(3.19 MB) Additional Information: full citation , abstract , references , citings , index 

terms , review 

We describe the tools and theory of a comprehensive system for database design, and 
show how they work together to support multiple conceptual and logical design processes. 
The Database Design and Evaluation Workbench (DDEW) system uses a rigorous, 
information-content-preserving approach to schema transformation, but combines it with 
heuristics, guess work, and user interactions. The main contribution lies in illustrating how 
theory was adapted to a practical system, and how the consistency ... 

Keywords: applications of database theory, computer-aided software engineering, data 
model translation, database design, database equivalence, design heuristics, entity- 
relationship model, heuristics, normalization, view integration 
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1 A robust, optimization-based approach for approximate answering of a gg regate 
^ queries 

^ Surajit Chaudhuri, Gautam Das, Vivek Narasayya 

May 2001 ACM SIGMOD Record , Proceedings of the 2001 ACM SIGMOD international 

conference on Management of data SIGMOD '01, Volume 30 issue 2 
Publisher: ACM Press 



Full text available: 



jf(221 91 KB) Addit ' onal Information: full citation , abstract , references , citings, index 
„, . terms 



The ability to approximately answer aggregation queries accurately and efficiently is of 
great benefit for decision support and data mining tools. In contrast to previous sampling- 
based studies, we treat the problem as an optimization problem whose goal is to minimize 
the error in answering queries in the given workload. A key novelty of our approach is 
that we can tailor the choice of samples to be robust even for workloads that are "similar" 
but not necessarily identical ... 

Session: database and program conversion: Towards the support of integrated views Q 
of multiple database s: an aggreg ate schema facility 
Donald Swartwout, James P. Fry 

May 1978 Proceedings of the 1978 ACM SIGMOD i nternational conference on 
management of data 

Publisher: ACM Press 

Full text available: ^]pdf(1.03 MB) Additional Information: full citation , abstract , references 

Supporting multiple user views of databases is currently an important problem area in 
database management system development. An interesting facet of this problem arises 
whenever a user needs an integrated view of several distinct databases. Using traditional 
database concepts, an aggregate schema facility has been developed to address this 
problem. The basic functions of an aggregate schema facility are discussed, as well as 
their implementation in a CODAS YL/DBTG-like environment. Interest in a ... 

Keywords: aggregate schema, data definition languages, data translation, database 
integration, database management systems, database restructuring, distributed 
databases, dynamic translation 



3 Optimizing spatial Min/Max aggregations 
Donghui Zhang, J. Tsotras 
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April 2005 The VLDB Journal — The International Journal on Very Large Data Bases, 

Volume 14 Issue 2 
Publisher: Springer-Verlag New York, Inc. 

Full text available: ^ pdf(471.77 KB) Additional Information: full citation , abstract 

Aggregate computation over a collection of spatial objects appears in many real -life 
applications. Aggregates are computed on values (weights) associated with spatial 
objects, for example, the temperature or rainfall over the area covered by the object. In 
this paper we concentrate on MIN/MAX aggregations: "given a query rectangle, find the 
minimum/maximum weight among all objects intersecting the query rectangle." 
Traditionally such queries have been performed as range searches. A ... 

Keywords: Indexing, Min/Max, Spatial aggregates 



Spatial Query Processing Algorithms: Improving min/max aggregation over spatial 
objects 

Donghui Zhang, Vassilis J. Tsotras 

November 2001 Proceedings of the 9th ACM international symposium on Advances in 

geographic information systems 
Publisher: ACM Press 

Full text available: pdf(1.70 MB) Additional Information: full citation, abstract, references, citings, index 

terms 

We examine the problem of computing MIN/MAX aggregates over a collection of spatial 
objects. Each spatial object is associated with a weight (value), for example, the average 
temperature or rainfall over the area covered by the object. Given a query rectangle, the 
MIN/MAX problem computes the minimum/maximum weight among all objects 
intersecting the query rectangle. Traditionally such queries have been performed as range 
search queries. Assuming that the objects are indexed by a spatial access m ... 

Keywords: Min/Max, indexing, spatial aggregates 



Physical interface: TAG: a Tiny AGgregation service for ad-hoc sensor networks 
Samuel Madden, Michael J. Franklin, Joseph M. Hellerstein, Wei Hong 
December 2002 ACM SIGOPS Operating Systems Review, volume 36 issue si 
Publisher: ACM Press 

Full text available: ^ pdf(2.19 MB) Additional Information: full citation , abstract , references , citings 

We present the Tiny AGgregation (TAG) service for aggregation in low-power, distributed, 
wireless environments. TAG allows users to express simple, declarative queries and have 
them distributed and executed efficiently in networks of low-power, wireless sensors. We 
discuss various generic properties of aggregates, and show how those properties affect 
the performance of our in network approach. We include a performance study 
demonstrating the advantages of our approach over traditional centralize ... 

Techniques for Structuring Database Records 
Salvatore T. March 

March 1983 ACM Computing Surveys (CSUR), Volume 15 issue l 
Publisher: ACM Press 

Full text available: |g) pdf(3.02 MB) Additional Information: full citation , references , citings , index terms 



Complete answer aggregates for treelike databases: a novel approach to combine 
querying and navigation 



Holger Meuss, Klaus U. Schulz 

April 2001 ACM Transactions on Information Systems (TOIS), Volume 19 issue 2 
Publisher: ACM Press 



The use of markup languages like SGML, HTML or XML for encoding the strucutre of 
documents or linguistic data has lead to many databases where entries are adequately 
described as trees. In this context querying formalisms are interesting that offer the 
possiblity to refer both to textual content and logical structure. We consider models where 
the strucutre specified in a query is not only used as a filter, but also for selecting and 
presenting different parts of the data. If answers are formaliz ... 

Keywords: SGML, XML, answer presentation, information retrieval, logic, query 
languages, semistructured data, structured documents, tree databases, tree matching 



Tools & techniques track: frameworks for building libraries: Using collection 

descriptions to enhance an ag gre gation of harvested item-level metadata 

Muriel Foulonneau, Timothy W. Cole, Thomas G. Habing, Sarah L. Shreeves 

June 2005 Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries 

Publisher: ACM Press 

Full text available: *g| pdf(1.09 MB) Additional Information: full citation , abstract , references , index terms 

As an increasing number of digital library projects embrace the harvesting of item-level 
descriptive metadata, issues of description granularity and concerns about potential loss 
of context when harvesting item-level metadata take on greater significance. Collection- 
level description can provide valuable context for item-level metadata records harvested 
from disparate and heterogeneous providers. This paper describes an ongoing experiment 
using collection-level description in concert with item-l ... 

Keywords: collection-level description, descriptive metadata, metadata aggregation, 
open archives initiative 



Usa g e and relationships: An architecture for the a gg regation and analysis of 

scholarly usage data 

Johan Bollen, Herbert Van de Sompel 

June 2006 Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries 
JCDL '06 

Publisher: ACM Press 

Full text available: pdf(753.90 KB) Additional Information: full citation , abstract , references , index terms 




Although recording of usage data is common in scholarly information services, its 
exploitation for the creation of value-added services remains limited due to concerns 
regarding, among others, user privacy, data validity, and the lack of accepted standards 
for the representation, sharing and aggregation of usage data. This paper presents a 
technical, standards-based architecture for sharing usage information, which we have 
designed and implemented. In this architecture, OpenURL-compliant linking ... 

Keywords: OAI-PMH, aggregation, analysis, architecture, digital libraries, openURL, 
standards, usage data 
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Ag gregate structure identification and its application to program analysis 
G. Ramalingam, John Field, Frank Tip 

January 1999 Proceedings of the 26th ACM SIGPLAN-SIGACT symposium on 



Principles of programming languages 

Publisher: ACM Press 

Full text available: ^ }pdf(1.92 MB) Additional Information: full citation , references , citings , index terms 



Performance evaluation of the statistical aggregation by categorization in the SM3 
system 

C. K Baru, S. Y. W. Su 

June 1984 ACM SIGMOD Record , Proceedings of the 1984 ACM SIGMOD international 

conference on Management of data SIGMOD '84, volume 14 issue 2 
Publisher: ACM Press 

Full text available: ^ pdf(1.32 MB) Additional Information: full citation , abstract , references 

To perform a statistical aggregation operation over a large file often requires that the 
records of the file be divided into categories based on the values of the attribute(s) over 
which some statistical computation is to be performed. It is rather inefficient to perform 
the necessary data transfer, categorization and statistical computation using a single 
processor Parallel algorithms designed for multiprocessor systems have been proposed 
and their performance improvement over the conventional ... 

Research papers: stream aggregation: Multiple aggregations over data streams 
Rui Zhang, Nick Koudas, Beng Chin Ooi, Divesh Srivastava 

June 2005 Proceedings of the 2005 ACM SIGMOD i nternational conference on 
Management of data 

Publisher: ACM Press 

Full text available: ^ pdf(403.02 KB) Additional Information: full citation , abstract , references 

Monitoring aggregates on IP traffic data streams is a compelling application for data 
stream management systems. The need for exploratory IP traffic data analysis naturally 
leads to posing related aggregation queries on data streams, that differ only in the choice 
of grouping attributes. In this paper, we address this problem of efficiently computing 
multiple aggregations over high speed data streams, based on a two-level LFTA/HFTA 
DSMS architecture, inspired by Gigascope.Our first contribution ... 

Online agg re gation 

Joseph M. Hellerstein, Peter J. Haas, Helen J. Wang 

June 1997 ACM SIGMOD Record , Proceedings of the 1997 ACM SIGMOD international 

conference on Management of data SIGMOD '97, volume 26 issue 2 
Publisher: ACM Press 

Full text available: pdf(1.92 MB) Additional Information: full citation, abstract , references , citings, index 

Aggregation in traditional database systems is performed in batch mode: a query is 
submitted, the system processes a large volume of data over a long period of time, and, 
eventually, the final answer is returned. This archaic approach is frustrating to users and 
has been abandoned in most other areas of computing. In this paper we propose a new 
online aggregation interface that permits users to both observe the progress of their 
aggregation queries and control execution on ... 

Technical poster session 1: multimedia analysis, processing, and retrieval: 

Calculation of an aggregated level of interest function for recorded events 
Rahul Nair 

October 2004 Proceedings of the 12th annual ACM international conference on 
Multimedia 

Publisher: ACM Press 

Full text available: Additional Information: 



^ pdfd 28.26 KB) full citation , abstract , references , index terms 

As recording technology becomes pervasive there is a dramatic increase in the number of 
events being recorded in multimedia. The challenge now facing users is to quickly view 
the recorded content in the least amount of time. While there are several methods to 
analyze video based on ambient noise, scene changes, slide transitions, etc., these 
techniques merely find features in the recording, they do not reveal which sections are 
important. 

This paper presents a method to calculate a I 

Keywords: bookmark aggregation, level of interest, multimedia, skimming, video 
browsing, visualization 



15 Historical spatio-temporal aggregation 
Yufei Tao, Dimitris Papadias 

January 2005 ACM Transactions on Information Systems (TOIS), Volume 23 issue l 
Publisher: ACM Press 

Full text available: ^[pdf (1.42 MB) Additional Information: full citation , abstract , references , index terms 

Spatio-temporal databases store information about the positions of individual objects over 
time. However, in many applications such as traffic supervision or mobile communication 
systems, only summarized data, like the number of cars in an area for a specific period, 
or phone-calls serviced by a cell each day, is required. Although this information can be 
obtained from operational databases, its computation is expensive, rendering online 
processing inapplicable. In this paper, we present special ... 

Keywords: Aggregation, access methods, cost models 



16 Compressed data cubes for OLAP aggregate query approximation on continuous 
dimensions 

Jayavel Shanmugasundaram, Usama Fayyad, P. S. Bradley 

August 1999 Proceedings of the fifth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Publisher: ACM Press 

Full text available: ^pdf(1.12 MB) Additional Information: full citation , references , citings , index terms 



Keywords: OLAP, approximate query answering, clustering, data cubes, data mining, 
density estimation 



17 Progressive evaluation of nested aggregate queries 
Kian-Lee Tan, Cheng Hian Goh, Beng Chin Ooi 

December 2000 The VLDB Journal — The International Journal on Very Large Data 

Bases, Volume 9 Issue 3 
Publisher: Springer-Verlag New York, Inc. 

Full text available: ^ pdf(380.81 KB) Additional Information: full citation , abstract , index terms 

In many decision-making scenarios, decision makers require rapid feedback to their 
queries, which typically involve aggregates. The traditional blocking execution model can 
no longer meet the demands of these users. One promising approach in the literature, 
called online aggregation, evaluates an aggregation query progressively as follows: as 
soon as certain data have been evaluated, approximate answers are produced with their 





respective running confidence intervals; as more data a ... 

Keywords: Approximate answers, Multi-threading, Nested aggregate queries, Online 
aggregation, Progressive query processing 



18 Su pporting education: Metadata ag g regation and "automated digital libraries": a 
^ retrospective on the NSDL experience 

^ Carl Lagoze, Dean Krafft, Tim Cornwell, Naomi Dushay, Dean Eckstrom, John Saylor 

June 2006 Proceedings of the 6th ACM/IE EE-CS joint conference on Digital libraries 

JCDL '06 
Publisher: ACM Press 

Full text available: ^ pdf(346.87 KB) Additional Information: full citation , abstract , references , index terms 

Over three years ago, the Core Integration team of the National Science Digital Library 
(NSDL) implemented a digital library based on metadata aggregation using Dublin Core 
and OAI-PMH. The initial expectation was that such low-barrier technologies would be 
relatively easy to automate and administer. While this architectural choice permitted rapid 
deployment of a production NSDL, our three years of experience have contradicted our 
original expectations of easy automation and low people cost. We ... 

Keywords: NSDL, OAI-PMH, architecture, interoperability, metadata 



19 Processing time-constrained ag greg ate queries in CASE-DB 
A. Wen-Chi Hou, Gultekin Ozsoyoglu 

V June 1993 ACM Transactions on Database Systems (TODS), Volume 18 issue 2 
Publisher: ACM Press 

Full text available: f £| pdf(2.62 MB) Additional Information: full citation , abstract, references , citings, index 
^ terms , review 

In this paper, we present an algorithm to strictly control the time to process an estimator 
for an aggregate relational query. The algorithm implemented in a prototype database 
management system, called CASE-DB, iteratively samples from input relations, and 
evaluates the associated estimator until the time quota expires. In order to estimate the 
time cost of a query, CASE-DB uses adaptive time cost formulas. The formulas are 
adaptive in that the parameters of the formulas can be ... 

Keywords: estimation, relational algebra, risk of overspending, sampling, selectivity, 
time constraints 
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